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ABSTRACT 

In this paper, we analyze the complexity of natural parallelizations of Delaunay refinement methods for mesh gener- 
ation. The parallelizations employ a simple strategy: at each iteration, they choose a set of "independent" points to 
insert into the domain, and then update the Delaunay triangulation. We show that such a set of independent points 
can be constructed efficiently in parallel and that the number of iterations needed is O(log (L/s)), where L is the 
diameter of the domain, and s is the smallest edge in the output mesh. In addition, we show that the insertion of 
each independent set of points can be realized sequentially by Ruppert's method in two dimensions and Shewchuk's 
in three dimensions. Therefore, our parallel Delaunay refinement methods provide the same element quality and 
mesh size guarantees as the sequential algorithms in both two and three dimensions. For quasi-uniform meshes, such 
as those produced by Chew's method, we show that the number of iterations can be reduced to 0(log(L/s)). To 
the best of our knowledge, these are the first provably polylog(L/s) parallel time Delaunay meshing algorithms that 
generate well-shaped meshes of size optimal to within a constant. 

Keywords: Delaunay refinement, simplicial meshes, parallel algorithms, computational geometry 



1. INTRODUCTION 

Delaunay refinement is a popular and practical tech- 
nique for generating well-shaped unstructured meshes 
[19, 29, 34]. The first step of a Delaunay refinement 
algorithm is the construction of a constrained or con- 
forming Delaunay triangulation of the input domain. 
This initial Delaunay triangulation need not be well- 
shaped. Delaunay refinement then iteratively adds 
new points to the domain to improve the quality of 
the mesh and to make the mesh respect the bound- 
ary of the input domain. A sequential Delaunay re- 
finement algorithm typically adds one new vertex per 
iteration, although sometimes one may prefer to in- 
sert more than one new vertex at each iteration. Each 
new point or set of points is chosen from a set of po- 
tential candidates — the circumcenters of poorly con- 
ditioned simplices (to improve mesh quality) and the 
diameter-centers of boundary simplices (to conform to 
the domain boundary). Ruppert [29] was the first to 
show that the proper application of Delaunay refine- 



ment produces well-shaped meshes in two dimensions 
whose size is within a small constant factor of the best 
possible. Ruppert's result was then extended to three 
dimensions by Shewchuk [34] and Li and Teng [19]. 
Efficient sequential Delaunay refinement software has 
been developed both in two [29, 33] and three dimen- 
sions [34]. Chrisochiedes and Nave [9] and Okusanya 
and Peraire [27] developed parallel software using De- 
launay refinement, for which they have reported good 
performance. Recently, Nave et al. [26] presented a 
parallel Delaunay refinement algorithm and proved 
that it produces well-shaped meshes. The complex- 
ity of their algorithm as well as the size of the mesh it 
outputs remains unanalyzed. 

In this paper, we study the parallel complexity of a 
natural parallelization of Delaunay refinement. One 
of the main ingredients of our parallel method is a 
notion of independence among potential candidates 
for Delaunay insertion at each iteration. Our parallel 
Delaunay method performs the following steps during 
each iteration. 
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1. Generate an independent set of points for parallel 
insertion; 

2. Update the Delaunay triangulation in parallel. 
Our independent sets have the following properties: 

• Their insertion can be realized sequentially by 
Ruppert's method in 2D and Shewchuk's in 3D. 
Hence, an algorithm that inserts all their points 
in parallel will inherit the guarantees of Rup- 
pert's and Shewchuk's methods that the output 
mesh be well-shaped and have size optimal up to 
a constant. 

• The independent sets can be generated efficiently 
in parallel. In addition, they are "large enough" 
so that the number of parallel iterations needed is 
0(log 2 (L/s)), where L and s are the diameter of 
the domain and the smallest edge in the output 
mesh, respectively. 

• When a quasi-uniform mesh is desired as in 
Chew's method, the number of iterations can be 
reduced to 0(log(L/s)). 

We should emphasize here that our analysis focuses 
on the number of parallel iterations of Delaunay re- 
finement. The independence of the new points do 
not necessarily imply a straightforward parallel inser- 
tion scheme at each iteration. There are several ex- 
isting parallel Delaunay triangulation algorithms that 
we can employ at each iteration. For example, in 2D 
we can use the divide-and-conquer parallel algorithm 
developed by Blelloch et al. [4] for Delaunay trian- 
gulation. Their algorithm uses O(nlogn) work and 
O(log 3 n) parallel time. We can alternatively use the 
randomized parallel algorithms of Reif and Sen [28], 
or by Amato et al. [1], in both two and three dimen- 
sions. Both of these randomized parallel Delaunay tri- 
angulation algorithms have expected parallel running 
time O(logn). Using one of these adds a logarithmic 
factor to our worst-case total parallel time complex- 
ity analysis. To the best of our knowledge, these are 
the first provably polylog(L/s) parallel time Delaunay 
meshing algorithms that generate well-shaped meshes 
of size optimal to within a constant. 

1.1 Motivation and Related Work 

This work is motivated by the observation that both 
sequential and parallel implementations of Delau- 
nay refinement algorithms seem to produce the best 
meshes in practice. However, improvements in the 
speed of parallel numerical solvers are creating the 
need for comparable speedups in meshing software: 
Lohner and Cebral [20] have reported that improve- 
ments in parallel numerical solvers [37] have resulted 
in the simulation time of numerous practical systems 
being dominated by the meshing process. 



Quadtree-based methods are an alternative to De- 
launay refinement. They also generate well-shaped 
meshes whose size is within a constant factor of the 
best possible [2, 24]. In practice, however, they of- 
ten generate meshes larger than Delaunay refinement 
on the same input. The parallel complexity of the 
quadtree-based methods is nevertheless better under- 
stood. 

Several parallel mesh generation algorithms have been 
developed. On the theoretical extreme, Bern, Epp- 
stein and Teng [3] gave a parallel O(logn) time al- 
gorithm using K/logn processors to compute a well- 
shaped quadtree mesh, where K is the final mesh size. 
There is also a simple level-by-level quadtree-based 
method that is used in practice [32, 36]. One can eas- 
ily show that this level-by-level based method takes 
0(log(L/s) + K/p) parallel time, using p processors 
[36]. 

Building upon [3], Miller et al. [22] developed a paral- 
lel sphere-packing based Delaunay meshing algorithm 
that generates well-shaped Delaunay meshes of opti- 
mal size in O(logn) parallel time using K/logn pro- 
cessors. Their method uses a parallel maximal inde- 
pendent set algorithm [21] to directly generate the set 
of final mesh points, and then constructs the Delau- 
nay mesh using parallel Delaunay triangulation. As 
this algorithm has not been implemented, we do not 
know how the meshes it produces will compare. 

Various parallel Delaunay refinement methods have 
been implemented and been seen to have good per- 
formance [9, 18, 20, 27]. These methods address some 
important issues such as how to partition the domain 
so as to minimize the communication cost among the 
processors. Our new analysis on the number of paral- 
lel iterations of Delaunay refinement could potentially 
provide provable bounds on their parallel complexity. 

Our work also helps explain the performance of some 
sequential implementations of Delaunay refinement, 
especially those which use a Delaunay triangulator as 
a black-box. In such situations, it is often desirable to 
minimize the number of calls to the black-box Delau- 
nay triangulator by inserting multiple points at each 
iteration. Our bounds on the number of iterations pro- 
vide a bound on the number of calls to the Delaunay 
triangulator. 

We omit the proofs of Lemmas 8, 9, 16, 10, and 
18 and Theorems 13, 14, 21, and 22 in this version 
due to page limitation. A full version of the pa- 
per is available at http://www.cs.duke.edu/~ungor/ 
abstracts/parallelDelRef .html. 



Parallel Delaunay Refinement: Algorithms and Analyses 



3 



2. PRELIMINARIES 

2.1 Input Domain 

In 2D, the input domain CI is represented as a planar 
straight line graph (PSLG) [29] — a proper planar 
drawing in which each edge is mapped to a straight 
line segment between its two endpoints. The segments 
express the boundaries of O. and the endpoints are the 
vertices of O. The vertices and boundary segments of 
CI will be referred to as input features of CI. A vertex 
is incident to a segment if it is one of the endpoints of 
the segment. Two segments are incident if they share 
a common vertex. In general, if the domain is given as 
a collection of vertices only, then the boundary of its 
convex hull is taken to be the boundary of the input. 

Miller et al. [23] presented a natural extension of 
PSLGs, called piecewise linear complexes (PLCs), to 
describe domains in three and higher dimensions. In 
three dimensions, the domain O is a collection of ver- 
tices, segments, and facets where (i) all lower dimen- 
sional elements on the boundary of an element in CI 
also belongs to CI, and (ii) if any two elements in- 
tersect, then their intersection is a lower dimensional 
element in CI. In other words, a PLC in d dimensions 
is a cell complex with polyhedral cells from to d 
dimensions. 

2.2 Delaunay Triangulation 

Let P be a point set in R d . A simplex t formed by a 
subset of P points is a Delaunay simplex if there exists 
a circumsphere of t whose interior does not contain 
any points in P. The Delaunay triangulation of P, 
denoted Del(P), is a PLC that contains all Delaunay 
simplices. If the points are in general position, that is, 
if no d + 2 points in P are co-spherical, then Del(P) is 
a simplicial complex. 

The Delaunay triangulation of a point set can be con- 
structed in O(nlogn) time in 2D [10, 17, 16] and in 
0(n rd/21 ) time in d dimensions [10, 31]. A nice survey 
of these algorithms can be found in [16]. 

One way to obtain a triangulation that conforms to 
the boundary of a PSLG domain is to use a Con- 
strained Delaunay triangulation. Let P be the set 
of vertices of a PSLG 0_. Two points p and q in P 
are said to be visible from each other if the line seg- 
ment pq does not intersect the interior of any segment 
in 0_. Three points form a constrained Delaunay tri- 
angle if the interior of their circumcircle contains no 
point from P that is visible from all three points. The 
union of all constrained Delaunay triangles forms a 
constrained Delaunay triangulation CDT(O). Chew 
developed an algorithm for computing constrained De- 
launay triangulations [8]. 




Figure 1. Circumcenter of triangle abc encroaches the seg- 
ment pq. 

A Delaunay triangulation T of input and Steiner points 
is a conforming Delaunay triangulation of a PLC O. 
if every face of O is a union of faces of T. In 2D, Edels- 
brunner and Tan proved that O (n ) additional points 
are sufficient to generate a conforming triangulation of 
a PSLG of complexity n [15]. A 2D solution proposed 
by Saalfeld [30] is extended to 3D by Murphy et al. 
[25] and Cohen-Steiner et al. [11]. However, it remains 
open whether the size of their output is polynomial in 
the input size or local feature size. The definition of 
local feature size will be given in Section 3. When 
the angle between the faces of a PLC is bounded from 
below, say for example by n/2, then one can apply De- 
launay refinement to generate well-shaped conforming 
triangulations whose size is close to optimal both in 
two [6, 29] and three dimensions [7, 19, 34]. 

3. 2D SEQUENTIAL DELAUNAY 
REFINEMENT 

In this section, we recall Ruppert's and Chew's algo- 
rithms for constructing Delaunay meshes of PLSGs in 
2D. Following Ruppert [29], we assume that the angle 
between two adjacent input segments is at least n/2. 
Boundary treatments that relax this assumption are 
discussed in [29, 35]. 

In the process of Delaunay refinement, one could either 
maintain a constrained Delaunay triangulation, or one 
just keeps track of the set of input segments that are 
not respected. The first approach does not extend 
to three dimensions because, in 3D, some PLCs do 
not have a constrained Delaunay triangulation. We 
therefore use the second approach. 

At each iteration, we choose a new point for insertion 
from a set of candidate points. There are two kinds 
of candidate points: (1) the circumcenters of existing 
triangles, and (2) the midpoints of existing boundary 
segments. 

Let the diametral circle of a segment be the circle 
whose diameter is the segment. A point is said to en- 
croach a segment if it is inside the segment's diametral 
circle. (See Figure 1.) 
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At iteration t, the circumcenter of a triangle is a po- 
tential candidate for insertion if the triangle is poorly 
shaped. For example, in Ruppert's algorithm, a tri- 
angle is considered poorly shaped if the ratio of its 
circumradius to the length of its shortest side is larger 
than a pre-specified constant |3r > \/2. Let c' ' de- 
note the set of all potential candidate circumcenters 
that do not encroach any segment. Let C (l ' denote 
their corresponding circumcircles. Similarly, let ' 
denote the set of all potential candidate circumcenters 
that do encroach some segment. Let denote their 
corresponding circumcircles. 

The midpoint of a boundary segment is a candidate for 
insertion if (1) the segment is not part of the current 
Delaunay triangulation, that is, its diametral circle 
is encroached by some existing mesh points, or (2) 
its segment is encroached by a circumcenter in B. In 
the latter case, this potential circumcenter candidate 
is rejected from insertion. Let T>\ ' be all midpoint 
candidates of type (1) and let 2?g' be all midpoint 
candidates of type (2). 



Algorithm 1 Sequential Delaunay Refinement 

Input: A PSLG domain O in K 2 

Let T be the Delaunay triangulation of the vertices 
of O. Let i = and compute B li] , C m , V l j ] , and 

U B ' 

while C (i) U V ( j ] U B [i] is not empty do 

Choose a point q from c' ' U t>j ' U X>g' and in- 
sert q into the triangulation. If q is a midpoint 
of a segment s, remove s from the segment list 
and replace it with two segments from q to each 
endpoint of s; 

Update the Delaunay triangulation T; i = i + 1 ; 
Compute B , C , T>j ' , and T>g . 
end while 



The points inserted by the Delaunay refinement are 
often called Sterner points. 

If a quasi-uniform mesh, such as that produced by 
Chew's method, is desired [6], then we use the fol- 
lowing notion of poorly shaped triangle: A triangle is 
poorly-shaped if the ratio of its circumradius to the 
length of the shortest edge in the current Delaunay 
triangulation T is more than a pre-specified constant 
Pc > V2. 

Figure 2 shows the output of the Delaunay refinement 
illustrating the difference between Chew's and Rup- 
pert's refinement. We call these two variants of the 
Delaunay refinement algorithm Chew 's algorithm and 
Ruppert's algorithm. 
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(a) (b) 

Figure 2. The output of (a) Chew's and (b) Ruppert's algo- 
rithm on the same input. Both of these meshes have minimum 
angle > 29°. The first mesh has 2246 and the second has 131 
elements. 

In their original papers [6, 29], Chew and Ruppert pre- 
sented their Delaunay refinement algorithms as partic- 
ular variations of Algorithm 1 — they specified how to 
choose the next point at each iteration from the set of 
candidates. In this paper, we will consider the follow- 
ing variation of Algorithm 1 which is more aggressive 
in adding boundary points — we choose this variation 
to parallelize because its analysis is relatively simpler 
to present. 

In this variation, B (i) , C (i) , and V { r l) are the same as 
in Algorithm 1. The set T>^ is built incrementally. At 
iteration i, we compute B {i] first and let V ii] be the 
set of diametral circles that are encroached by some 
circumcenters of B [i) . We then set V 1 ^ = £>g _1) U 

In other words, if a segment is encroached by a circum- 
center of a poorly-shaped Delaunay triangle, its mid- 
point will be added to the set of candidate midpoints 
and remains candidate thereafter. This is in contrast 
with Algorithm 1, in which an encroached midpoint is 
added to the set of candidate midpoints only for the 
next iteration. If another candidate is chosen that is 
in a circumcircle whose center encroaches the segment, 
the circumcircle will no longer be in the Delaunay tri- 
angulation at the end of the iteration, and hence the 
segment might not be encroached in the triangulation 
at the end of the iteration. So, its midpoint might not 
be a candidate in future iterations. 

Assuming that the angle between two adjacent input 
segments is at least 7t/2, Chew's algorithm terminates 
with well-shaped quasi-uniform meshes, while Rup- 
pert's algorithm [29] terminates with a well-shaped 
Delaunay mesh of the input domain whose elements 
adapt to the local geometry of the domain. The 
number of triangles in the mesh generated by Rup- 
pert's algorithm is asymptotically optimal up to a con- 
stant. The proofs of Ruppert's and Chew's [6, 29] that 
their algorithms terminate with a well-shaped mesh of 
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size within a constant factor of optimal can be easily 
extended to our variation of Algorithm 1 discussed 
above. We refer interested readers to [29] and [35]. 
Here we give a high level argument and introduce an 
important concept that will be used in Section 4.2.3 
for preprocessing an input domain in parallel. 

Given a domain CI, the local feature size of each point 
x in CI, denoted by lfsn (x), is the radius of the small- 
est disk centered at x that touches two non-incident 
input features. Ruppert showed that every Delaunay 
triangle in the final mesh is well-shaped and that the 
length of the longest edge in each Delaunay triangle 
is within a constant factor of lfsn(x) for each x in the 
interior of the triangle. 

Suppose M. is a mesh generated by our variation of 
Algorithm 1. Let Cl' be the domain obtained from 
Cl by adding to Cl all mesh points in M that are on 
the boundary segments of CI. Then we can show (i) 
for all x in CI, lfsn(x) and lfs n /(x) are within a small 
constant factor of each other; and (ii) M. can be ob- 
tained by applying Ruppert 's (or Chew's) variations 
of Algorithm 1 to CI'. Therefore, the mesh produced 
by our variation of Algorithm 1 has size within a small 
constant factor of the one generated by Ruppert's (or 
Chew's) refinement method. 

4. PARALLEL 2D DELAUNAY 
REFINEMENT 

To better illustrate our analysis of parallel Delaunay 
refinement, we first focus on the case in which the 
input is a periodic point set (PPS) as introduced by 
Cheng et al. [5]. See also [12]. We will then extend 
our results to produce boundary conforming meshes 
when the input domain is a PSLG. 

4.1 Input Domain: Periodic Point Sets 

If P is a finite set of points in the half open unit square 
[0, 1 ) and Z is the two dimensional integer grid, then 
S = P + Z is a periodic point set [12]. The periodic 
set S contains all points p + v, where p G P and v 
is an integer vector. The Delaunay triangulation of a 
periodic point set is also periodic. 

As P is contained in the unit square, the diameter of 
r is L < y/l. When we refer to the diameter of a 
periodic point set, we will mean the diameter of P. 

4.1.1 A generic parallel algorithm (PPS) 

For a periodic point set, the only candidates for inser- 
tion are the circumcenters of poorly shaped triangles. 
We need a rule for choosing a large subset of the can- 
didates with the property that a sequential Delaunay 



refinement algorithm would insert each of the points 
in the subset. Our rule is derived from the following 
notion of independence among candidates. 

Definition 1 (Independence). Two circumcenters c 
and c' (and also the corresponding circles c and c'J 
are conflicting if both c and c' contain each other's 
center. Otherwise, c and c' (respectively c and c') 
are said to be independent. 

If two candidates conflict, at most one of them can be 
inserted. Our rule is to insert a maximal independent 
set (MIS) of candidates at each iteration. We will 
show that if an algorithm follows this rule, then it will 
terminate after a polylogarithmic number of rounds. 

Algorithm 2 Generic Parallel Delaunay Refinement 

Input: A periodic point set P in R 2 

Let T be the Delaunay triangulation of P 
Compute C, the set of all candidate circumcenters 
in T 

while C is not empty do 

Let 1 be an independent subset of C 

Insert all the points in I in parallel 

Update T and C 
end while 



In the next few subsections, we will discuss how to 
generate the independent sets used by the algorithm. 
But first, we prove that regardless of how one chooses 
the independent set, our parallel algorithm can be se- 
quentialized. This implies that the algorithm inher- 
its the guarantee of its sequential counterpart that it 
generates a well-shaped mesh of size that is within a 
constant factor of optimal. 

Theorem 2. Suppose M. is a mesh produced by an ex- 
ecution of the Generic Parallel Delaunay Refinement 
algorithm. Then M can be obtained by some exe- 
cution of one of the sequential Delaunay refinement 
algorithms discussed in Section 3. 

Proof: Let T\ ,Tz, . . . ,Ty_ be the sets of vertices in- 
serted by the parallel algorithm above at iterations 
1 , . . . , k, respectively. We describe a sequential execu- 
tion that inserts all the points in I\ before any point 
of I) for i < j . For each independent set Ii , we insert 
the candidates according to their circumradius in the 
order from largest to smallest. For any two circum- 
centers d, b 6 Ii, assume that the radius of a is larger 
than the radius of b. This implies that a can not be 
in the circumcircle of b, because d and b are indepen- 
dent. Therefore, the insertion of d will not eliminate 
the triangle of b . 

Furthermore, observe that in any sequential execution, 
the insertion of point p 6 Ii can not eliminate the 
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triangle corresponding to q 6 lj for any i < j, for 
otherwise, q would not exist in the j th iteration of the 
parallel execution. 

Therefore, the parallel and sequential executions ter- 
minate with the same Delaunay mesh. □ 

To minimize the number of iterations, intuitively, we 
should choose a maximal independent set of candi- 
dates at each iteration. In Section 4.1.3, we will give 
a geometric algorithm that computes a maximal inde- 
pendent set of candidates efficiently in parallel. Our 
algorithm makes use of the following observation. 

Lemma 3. Suppose c a and Cb are two conflicting cir- 
cumcircles at iteration i, and let r u and Tb be their 
circumradii. Then r b /2 < r a < 2r b . 

4.1.2 Parallelizing Chew's Refinement (PPS) 

In this section, we show that our parallel implemen- 
tation of Chew's refinement only needs 0(log(L/s)) 
iterations. The basic argument is very simple — we 
will show that the radius of the largest Delaunay circle 
reduces by a factor of 3/4 after some constant num- 
ber (e.g., 98) of iterations. Because the largest cir- 
cumradius initially is O(L) and the largest circumra- 
dius in the final mesh is O(s), the iteration bound of 
0(log(L/s)) follows immediately. 

Lemma 4. For all i, let T\ be the largest circumradius 
of a triangle in the Delaunay triangulation at the end 
of iteration i. For all k > 98, < 3rk-98/4. 

Proof: We assume by way of contradiction that rk > 
3ric_9g/4. Let i = k— 98. Let Ck be a circumcircle 
with radius rk after iteration k. Let Ck be the center 
of c k . 

For j < k, it is clear that Ck is also an empty circle in 
iteration j, because the refinement process only adds 
new points. But, Ck might not be a circumcircle at 
iteration j. We now show that for each iteration j, 
where i < ) < k, there exists a circumcircle c\ with 
center i\, and radius x\ such that (1) ||Cj' — Ckll < 3ri/4 
and (2) r- > 3r t /4. 

Let Pk, qk and tk be the vertices of the Delaunay 
triangle at iteration k that defines Ck- We will alter 
Ck in three stages to produce a suitable c\ that is the 
circumcircle of three points that exist at stage j: pj, 
qj and tj . 

i. Dilate Ck until it touches a mesh point pj. Note 
that Pj might well be Pk, qk, or tk, so, Ck might 
not actually expand at all during this step. 

ii. Grow the circle by moving its center away from Pj 
along the ray PjCk, and maintaining the property 
that pj lies on the boundary of the circle, until it 
touches a mesh point qj. 



iii. Continue to grow the circle, maintaining its con- 
tact with Pj and q j , moving its center away from 
the chord Pj q j , until it touches a vertex tj . 

The resulting circle z\ is a circumcircle of a Delau- 
nay triangle p , q j tj at iteration ). Moreover, Pjqjtj is 
a poorly-shaped triangle because its circumradius r\ 
is at least Tk. Thus, its center c\ is a candidate at 
iteration j. Note also that r- > r k > 3r t /4. 

Consider the triangle pjCkCj', which is non-acute at 
vertex Ck- Let x = CicPjl and y = |CkCj'|. Since 
IZCj'ckPjl is non-acute (rj') > x + y . As tj is the 
radius of a Delaunay triangle and j > i, i*j < i\. 
Combining this fact with x > r k > 3^/4, we find 
r- < x + n/4. So we can write, (x + n/4) 2 > x 2 + y 2 . 
By simplifying this inequality to xr i( /2 + r 2 /16 > y 2 
and substituting x < r t , we derive 9r 2 /16 > y 2 . 
Hence, y = \\c- - Ckll < 3n/4. 

Because Cj is empty at the end of iteration ) , we know 
Cj' was not chosen during iteration ). Because the in- 
dependent set of candidates that we select is maximal, 
there must be another circumcircle c" chosen in iter- 
ation ) that conflicts with Cj'. By Lemma 3, the radius 
of Cj" is at least one half of the radius of Cj', and so is 
at least |i"j'|/2 > 3rt/8. Moreover, the radius of Cj" is 
at most Tj and hence at most r^. So |c" — Cj | < T\. 
Hence, 

l|Cj"-c k | < ||c" — Cj'H + Hcj' — dkll < n+3n/4 < 7n/4. 

Let C" = {c^i , 6(^21 • • • i c j"> • • • i c kl. an d let C" be 
the corresponding set of circumcircles. As c" is in- 
serted during round I for each 1, each circle c" G C" 
is empty of all the centers c(' for 1 < j. So the 
centers in C" are pairwise at least 3rt/8 away from 
each other. Thus, one can draw disjoint circles of ra- 
dius 3n/8 around each of these points. The annu- 
lus containing these disjoint circles has area at most 
7T(7/4 + 3/16) 2 r 2 -7T(3/4-3/16) 2 r 2 . So, one can pack 
at most 

7t[(7/4 + 3/16) 2 -(3/4-3/16) 2 ]r 2 
7t(3/16) 2 r 2 

disjoint circles of radius 3n/16 in this region. This 
implies C"| = k — i < 97, a contradiction. □ 

Theorem 5. Our parallel implementation of Chew's 
refinement algorithm takes at most [98 log 4/3 (L/s)] 
iterations. 

4.1.3 Parallel Computation of MIS 

One can use the parallel maximal independent set al- 
gorithm of Luby [21] to compute a parallel indepen- 
dent set of candidates for each iteration in O(log n) 
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parallel time. In this section, we will explain how we 
can exploit the geometric structure of the indepen- 
dence relation to compute a maximal independent set 
in a constant parallel time. 

We will make extensive use of the result of Lemma 
3 that two circumcircles are conflicting at iteration j 
only if their radii are within a factor of 2 of each other. 

Lemma 6. At iteration j, if there are rij circumcir- 
cles, then a maximal independent set of candidates 
for Delaunay refinement can be computed in constant 
parallel time using n.j processors. 

Proof: Let be the set of circumcircles of radius 
more than L/2 K+1 and less than or equal to L/2 H , 
where h ranges from to log(L/sj ) and Sj is the small- 
est circumradius at iteration ) . Note that a circumcir- 
cle in C' h does not conflict with any circumcircle in C} 
if I > h + 1 . 

To compute a maximal set of non-conflicting candi- 
dates, we first in parallel find a maximal independent 
sets of circumcircles in C^, independently for all even 
h. We will show below that a maximal independent 
set of circumcircles in can be computed in con- 
stant time in parallel. Let le Vcn be the set of inde- 
pendent circumcircles computed. Then in one parallel 
step, we can eliminate all conflicting circumcircles in 
UhioddCj^. We then compute a maximal independent 
set for remaining circumcircles in for all odd h. 
Let this set be l , odd - Then V even U I' odd is a maximal 
independent set of circumcircles for iteration j. 

Note that all circumcircles in C' h have radius between 
L/2 H+1 and L/2 H . If we divide the square containing 
all circumcenters into a 2 h -by-2 H grid, then any cir- 
cumcenter that is conflict with a circumcenter in the 
grid box (x,y) must lie either in grid box (x,y) or one 
of its eight grid neighbors. 

We color grid boxes (x,y) with color (x mod3,y 
mod 3). We then cycle through the 9 color classes 
and, by a method we will explain momentarily, find 
a maximal independent set of the candidates in each 
grid-box of the current color in parallel. We then elim- 
inate in parallel the conflicting circumcenters that are 
in the color classes that have not yet been processed. 

Finally, we explain how to compute a maximal inde- 
pendent set among the candidates that lie in a given 
grid-box. First notice that any maximal independent 
set of candidates in a grid-box can have at most a 
constant number of members, and hence a maximal 
independent set can be found by a constant number 
of parallel selection-elimination operations: choose a 
center that has not been eliminated, and in parallel 
eliminate any centers with which it conflicts. 



In a parallel system that supports primitives such 
fetch_and_add, test_and_set, or parallel scan, we can 
use such a primitive to select in constant time a candi- 
date in a grid box. The processor that holds this can- 
didate becomes a "leader" in that round and broad- 
casts its candidate so that the conflicting candidates 
can be eliminated. With these primitives, our algo- 
rithm can be implemented in parallel constant time. 
However, if the parallel system does not support these 
primitives, then for each grid cell we can emulate par- 
allel scan to select a leader in O(logn) time, where n 
is the number of candidate centers in the cell. 

In general, many grid cells are empty and there is no 
need to generate them at all. We can use hashing 
to select grid cells that are not empty. The idea is 
very simple, each candidate center can compute the 
coordinates of its grid cell from the coordinates of its 
center and its radius. We can hash grid cells using 
their coordinates and therefore, all candidate centers 
belonging to a grid cell can independently generate 
the hash identity of the cell. We can then use parallel 
primitives discussed in the paragraph above to sup- 
port the computation of a maximal independent set 
of candidates for all non-empty grid cells. □ 

4.1.4 Parallelizing Ruppert's Refinement (PPS) 

In this section, we show that our parallelization of 
Ruppert's method for periodic point sets in 2D takes 
0(log 2 (L/s)) iterations. For simplicity, we give an 
analysis for the case |3r = although our analysis 
can be easily extended to the case when (3r = 1 + e, 
for any e > 0. We recall that (3r is the threshold of 
the ratio of the radius to shortest edge-length defining 
a poorly shaped triangle. Thus, for (3 r = insert- 
ing the circumcenter of a poorly shaped triangle whose 
shortest edge is h. introduces new Delaunay edges of 
length at least \fJh.. 



Algorithm 3 Parallel Ruppert's Refinement 
Input: A periodic point set P in R 

Let T be the Delaunay triangulation of P 
for i=l to [log^jfL/s)] do 

Let C be the set of all circumcenters of poorly- 
shaped triangles who are in class Ei 
while C is not empty do 

Let I be a maximal independent subset of C 
Insert all the points in I in parallel 
Update the Delaunay triangulation and C 
end while 
end for 



Let s be the length of the shortest edge in the initial 
Delaunay triangulation. At each iteration, we assign 
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an edge to class £i if its length is in 
Similarly, we assign a Delaunay triangle to £i if its 
shortest edge has length in yjl s,a/2 s^j . There 
are at most [log^fL/s)] of such classes. Using this 
definition, we can state and analyze the Parallel Rup- 
pert's Refinement Algorithm. 

Theorem 7. Given a periodic point set in 2D of diam- 
eter L, the Parallel Ruppert's Refinement Algorithm 
takes 0(log (L/s)) iterations. 

Proof: Lemmas 8 and 9 prove that after the ith itera- 
tion of the outer loop, each Delaunay triangle touching 
an edge in class £i will be well-shaped, and succes- 
sive iterations cannot degrade the shape of the Delau- 
nay triangles touching that edge. Lemma 10 implies 
that during each iteration of the outer loop, the inner 
loop of the algorithm will execute at most 0(log(L/s)) 
times. As the outer loop is executed 0(log(L/s)) 
times, the whole algorithm takes at most O (log 2 (L/s)) 
iterations. □ 

Lemma 8. During the ith iteration of the outer loop 
of the Parallel Ruppert's Refinement Algorithm, no 
Delaunay edges are added to or removed from class 
Si- 
Lemma 9. Suppose e is an edge in £j where ) < i. 
Then during the ith outer loop, the radius-edge ratio 
of triangles containing e does not increase. 

Lemma 10. Let e € £\, and let T\ be the radius of 
the larger of the two circumcircles containing e at the 
end of the l tH iteration of the inner loop during the 
ith iteration of the outer loop. Then, at the end of 
iteration k = 1 + 81 of the inner loop, either (1) both 
Delaunay triangles containing e are well-shaped, or (2) 
Tic < 3i*i/4 where Tk is the radius of the larger of the 
two circumcircles containing e after iteration k. 

4.2 Input Domain: PSLG 

In this subsection, we extend our parallel algorithm 
for generating a Delaunay mesh from a domain given 
by a periodic point set to a domain defined by a pla- 
nar straight-line graph. Following Ruppert, we assume 
that the angle between two adjacent input segments 
is at least n/2. A key step in Delaunay refinement 
for a domain specified by a PSLG is to properly add 
points to the boundary segments so that the Delaunay 
mesh is conforming to the boundary. In our parallel 
algorithm, we make our mesh conform to the bound- 
ary in two steps: First, we give, in Section 4.2.3, an 
O(log L/s) time parallel preprocessing algorithm to in- 
sert points to input segments so that the initial De- 
launay mesh is conforming to the boundary and no di- 
ametral circle intersects any other non-incident input 



features. Second when a segment is encroached during 
parallel Delaunay refinement, we include its midpoint 
as candidate for insertion. 

The preprocessing step might not be needed to im- 
plement our parallel algorithm: one could probably 
add points to the boundary as needed. However, the 
preprocessing step simplifies our analysis in this Sec- 
tion by greatly reducing the number of cases in the 
analysis. 

4.2.1 A generic parallel algorithm (PSLG) 

After applying the preprocessing step, the initial De- 
launay triangulation is conforming to the boundary 
and no diameter circle contains any point of the tri- 
angulation. We will maintain this invariant in our 
algorithm. 

In order to perform parallel refinement, as in Sec- 
tion 4.1.1, we need a rule of independence among 
candidates for refining boundary segments and poorly 
shaped triangles. We first recall the set of candidates 
for insertion defined in Section 3. 

Let B be the set of circumcircles of poorly shaped tri- 
angles whose centers B encroach some boundary seg- 
ments. Let C be the set of circumcircles of poorly 
shaped triangles whose centers C don't encroach any 
boundary segments. Let T> is the set of diametral cir- 
cles that are encroached by some centers in B. So, 
C U T> are candidate points for insertion. 

We will still apply Definition 1 to determine whether 
two circumcenters from C are independent. Because 
the angle between two adjacent input segments is at 
least n/2, after preprocessing, any two diametral cir- 
cles from T> are not overlapping. Every two diametral 
centers from B are independent. 

We will use the following definition of independence 
between a diametral center in T> and a circumcenter 
in C. Note that because a circumcenter in C does not 
encroach any boundary segment, a diametral circle of 
T> does not contain any center in C. 

Definition 11. A circumcenter c G C and a diametral 
center d G T> are conflicting if (i) d is inside c; and (ii) 
the radius of c is smaller than s/l times the radius of 
d. Otherwise, c and d (also c and d) are independent. 

This definition of independence is motivated by the 
following lemma first proved by Ruppert [29]. 

Lemma 12. If a circumcircle c of radius r c encroaches 
a diametral circle d of radius ra, then ra > r c /\/2. 

The following theorem extends Theorem 2 for domains 
given by PSLGs. 
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Algorithm 4 Generic Parallel Delaunay Refinement 

Input: A domain CI given by a PSLG in R 2 

Apply the parallel preprocessing algorithm of Sec- 
tion 4.2.3 

Let T be the initial Delaunay triangulation. 
Compute BC, an independent subset of B U C 
Let T> be the set of centers of diametral circles en- 
croached by the centers in BC 
while (BC n C) U V is not empty do 

Let I be an independent subset of {BC CI C) U T> 

Insert all the points in T in parallel 

Update the Delaunay triangulation 

Update B, C, BC and t> 
end while 



Theorem 13. For a domain CI specified by a PSLG, 
suppose M is a mesh produced by an execution of the 
parallel algorithm above. Then M can be obtained 
by some execution of one of the sequential Delaunay 
refinement algorithms discussed in Section 3. 

4.2.2 Parallelizing Chew's Refinement (PSLG) 

To parallelize Chew's algorithm for domain defined by 
a PSLG, we apply Algorithm 4 and use a maximal in- 
dependent set of the candidates at each iteration. In 
addition, because each pair of diametral centers in T> 
is independent, we include all centers T> in the inde- 
pendent set. The parallel algorithm of Section 4.1.3 
can be used to construct the maximal independent set. 

Theorem 14. Our parallel implementation of Chew's 
refinement algorithm takes 0(log(L/s)) iterations for 
a domain given by a PSLG, where L is the diameter 
of the domain and s is smallest local feature size. 

4.2.3 Parallel Preprocessing 

In the algorithm and proof presented in the last sub- 
section, we assume that the boundary of the domain 
has been preprocessed to satisfy the following prop- 
erty. 

Definition 15 (Strongly Conforming). A domain CI 
specified by a PSLG is strongly conforming if no di- 
ametral circle contains any vertex or intersects any 
other non-incident input features. 

Clearly, if CI is strongly conforming, then the Delau- 
nay triangulation of the vertices of CI is conforming to 

a. 

We will use the following parallel method to prepro- 
cess a domain CI to make it strongly conforming. This 
method repeatedly adds midpoints to boundary seg- 
ments whose diametral circles intersect non-incident 
input features. 



Algorithm 5 Parallel Boundary Preprocessing 

Input: A PSLG domain O in R 2 

Let Q be the set of segments in CI whose diametral 
circles intersect non-incident input features, 
while Q is not empty do 

Split all the segments in Q in parallel by midpoint 

insertion and update Q. 
end while 



Lemma 16. Parallel Boundary Preprocessing termi- 
nates in 0(log(L/s)) iterations. 

In the scheme above, we can grow a quadtree level 
by level to support the query of whether the diame- 
tral circle of a segment intersects another non-incident 
feature. The number of levels of the quadtree that we 
need to grow is at most log(L/s). As shown in [2, 3], 
one can use balanced quadtree to approximate local 
feature size function of CI to within a constant fac- 
tor. Therefore, using a balanced quadtree as [2, 3], we 
can preprocess the domain in log(L/s) parallel time 
so that the preprocessed domain is strongly feature 
conforming as defined below. 

Definition 17 (Strongly Feature Conforming). Let 

a > 2 be a constant. A domain CI specified by a 
PSLG is strongly feature conforming with parameter 
« if it is strongly conforming, and in addition, the 
length of each segment is no more than oc times the 
local feature size of its midpoint. 

In the next subsection, we will present a parallel im- 
plementation of Ruppert's algorithm for domains that 
are strongly feature conforming and show that it ter- 
minates in 0(log (L/s)) iterations. 

We use the following lemma to show that the size op- 
timality of our results are not affected much by the 
preprocessing. 

Lemma 18. Let CI and CI' denote the input before 
and after preprocessing, respectively. Then, for any 
point x in these domains, lfsn(x)/3 < lfs n '(x) < 
lfsn(x). 

4.2.4 Parallelizing Ruppert's Refinement (PSLG) 

In this section, we show that our parallelization of 
Ruppert's method for a domain given by a PSLG takes 
O(log 2 (L/s) ) iterations. Again, for simplicity, we will 
only give an analysis for the case when |3r = \fl. 

The parallel algorithm follows basic steps of the paral- 
lel Ruppert's Refinement presented earlier in Section 
4.1.4. But first, we apply the parallel preprocessing 
algorithm of Section 4.2.3 so that the preprocessed 
domain is strongly feature conforming. So below we 
can assume that CI is strongly conforming. 
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Let s be smallest local feature of Q. At each it- 
eration, we assign an edge to class ft if its length 
is m Similarly, we assign a Delau- 

nay triangle to £i if its shortest edge has length in 
There are at most [log^ (L/s)] of 
such classes. 

Algorithm 6 Parallel Ruppert's Refinement 
Input: A domain CI given by a PSLG that is strongly 
feature conforming. 

Let T be the initial Delaunay triangulation. 

for i=l to [log^fL/s)] do 

Let B be encroaching candidate circumcenters 
and C be the non-encroaching candidate circum- 
centers whose triangles is in class £\- 
Compute BC, an independent set of B U C. 
Let V be the set of centers of diametral circles 
encroached by the centers in BC 
while [BC nC)UPis not empty do 

Let T be an maximal independent subset of 

(BCnC)uv 

Insert all the points in T in parallel 
Update the Delaunay triangulation 
Update B, C, BC and t> 
end while 
end for 



Theorem 19. Given a domain specified by a PSLG, 
the Parallel Ruppert's Refinement Algorithm takes 
O ( log 2 ( L/ s ) ) iterations. 

The proof of Theorem 19 is essentially the same as 
the proof of Theorem 7 where we need to address the 
following two issues. 

1. The center of a circumcircle could potentially en- 
croach a boundary segment whose length is much 
larger than that the circumradius. 

2. The insertion of a midpoint on the boundary 
could potentially introduce smaller edges. 

To address the first issue, we apply parallel process- 
ing algorithm of Section 4.2.3 and hence assume CI is 
strongly feature conforming. Hence if a circumcenter 
encroaches a boundary segment, the circumradius and 
the length of the segment are with a constant factor of 
each other. In addition, because each boundary seg- 
ment can only be split at most a constant times in 
the refinement, it can not introduce smaller edges too 
many times. 

5. 3D DELAUNAY REFINEMENT 

A 3D domain is specified by a PLC (see Section 2.1). 
In this section, we assume that the angle between any 



two intersecting elements, when one is not contained 
in the other, is at least 90°. There are three kinds 
of spheres associated with a 3D Delaunay mesh that 
we are interested: the circumspheres, the diametral 
spheres, and the equatorial sphere given below. 

Definition 20. The equatorial sphere of a triangle in 
3D is the smallest sphere that passes through its ver- 
tices. A triangular subfacet of a PLC is encroached if 
the equatorial sphere is not empty. 

Chew's algorithm extends naturally to 3D. In [34], 
Shewchuk developed a 3D extension of Ruppert's al- 
gorithm. In Shewchuk's refinement, given below, a 
tetrahedron is bad if the ratio of its circumradius to 
its shortest edge, referred as the radius-edge ratio, is 
more than a pre-specified constant (3s > 2. 

Algorithm 7 3D Delaunay Refinement 

Input: A PLC domain CI in E 3 

Compute T, the Delaunay triangulation of the 
points of CI 

Let C be the set of non-encroaching circumcenters 
of the bad tetrahedra 

Let D be the set of non-encroaching equatorial cen- 
ters of the encroached triangular subfacets 
Let E be the set of diametral centers of the en- 
croached subsegments. 

while there is center a in C U D U E is not empty do 
Insert a and update the Delaunay triangulation 
Update C, D, and E 

end while 



5.1 Parallel 3D Delaunay Refinement 

In this subsection, we show that our results for a do- 
main given by a periodic point set can be extended 
from two dimensions to three dimensions to parallelize 
both Chew's and Shewchuk's algorithm. So far, we 
have not completed the analysis for domains specified 
by PLCs, although we think similar results can be ob- 
tained. 

The following is a parallel Delaunay refinement algo- 
rithm for domains specified by 3D periodic point sets. 

To parallelize Chew's 3D refinement, we use a maxi- 
mal independent set of candidate centers in Algorithm 
8. With almost the same proof as we have presented 
in Section 4.1.2, we can show that the number of iter- 
ations needed is 1076 log(L/s). 

Theorem 21. Suppose M is a mesh produced by an 
execution of the 3D Generic Parallel Delaunay Refine- 
ment algorithm. Then M. can be obtained by some 
execution of the sequential Delaunay refinement algo- 
rithm. 
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Algorithm 8 Generic 3D Parallel Delaunay Refinement 

Input: A periodic point set P in R 3 

Let T be the Delaunay triangulation of P 
Compute C, the set of circumcenters of bad tetra- 
hedra in T 

while C is not empty do 

Let T be an independent subset of C 

Insert all the points in I in parallel 

Update C 
end while 



5.1.1 Parallelizing Shewchuk's Refinement 

We will present our analysis for the case when |3r = 
\/2, although our analysis can be easily extended to 
the case when |3r = 1 + e, for any e > 0. Thus, 
for |3r = a/2, inserting the circumcenter of a poorly 
shaped triangle whose shortest edge is h. introduces 
new Delaunay edges of length at least \/lh. 

Let s be the length of the shortest edge in the initial 
Delaunay triangulation. At each iteration, we assign 
an edge to class Si if its length is in 
Similarly, we assign a Delaunay tetrahedra to £t if its 
shortest edge has length in . There are 

at most [log^j (L/s)] of such classes. 

Our parallel implementation of Shewchuk's algorithm 
is analogous to our parallel implementation of Rup- 
pert's algorithm. In addition, our proof in 3D is also 
analogous to the proof in 2D. 

Theorem 22. For a given periodic point set P in R 3 
of diameter at most L, if the length of the short- 
est edge in the mesh generated by Shewchuk's refine- 
ment is s, then parallel Shewchuk refinement takes 
0(log 2 (L/s)) iterations to generate a bounded radius- 
edge ratio mesh. 

6. DISCUSSION 

Polylogarithmic upper bounds on the number of par- 
allel iterations presented in Sections 4 and 5.1 consti- 
tutes the main component of the analyses of our paral- 
lel algorithms. At each iteration, our algorithms per- 
form two main operations: i) compute a maximal inde- 
pendent set of points for parallel insertion; ii) update 
the Delaunay triangulation inserting all these points. 
For the first one, we proposed a new constant time 
parallel algorithm. For the second, we suggested to 
use an existing logarithmic time parallel Delaunay tri- 
angulation algorithm. These immediately imply poly- 
logarithmic total time complexity for our parallel De- 
launay refinement algorithms. 

We opted for simplicity in our analyses. So, the con- 
stants in lemmas 4 and 10 are probably not optimal 



and likely to be much smaller in practice than 98 and 
81. 

The 3D extension of Chew's and Shewchuk's algo- 
rithms do not always guarantee that the resulting 
mesh has an aspect-ratio bounded by a constant. How- 
ever, they both guarantee a constant bound on the 
ratio of the circumradius to the length of the short- 
est edge (the radius-edge ratio) of any tetrahedra in 
the final mesh. So, the meshes these two algorithms 
generate might potentially contains slivers, which are 
elements with close to zero aspect-ratio but with a 
constant radius-edge ratio. Several quality enhanc- 
ing and guaranteeing meshing algorithms [5, 7, 14, 19] 
have been developed recently. Cheng et al. [5] and 
Edelsbrunner et al. [14] have already given parallel 
complexity of their sliver removal algorithms. Our 
framework can be used to analyze parallel complexity 
of the other two algorithms, by Chew [7] and Li and 
Teng [19]. 

We conclude the paper with two conjectures. 

• There is a parallel implementation of Ruppert's 
[29] and Shewchuk's [34] algorithm that runs in 
0(log(L/s)) iterations. 

• There is a parallel Ruppert's [29] and Shewchuk's 
[34] algorithm that runs in O(logn) time where n 
is the input complexity. Notice that Bern et al. 
[3] showed that the quadtree algorithm can be 
implemented in O(logn) time with K processors. 

We would also like to see results that establish the par- 
allel complexity of other mesh generation algorithms 
such as sink insertion [13]. 
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